Goto

Collaborating Authors

 tail distribution


On Goodhart's law, with an application to value alignment

El-Mhamdi, El-Mahdi, Hoang, Lê-Nguyên

arXiv.org Machine Learning

``When a measure becomes a target, it ceases to be a good measure'', this adage is known as {\it Goodhart's law}. In this paper, we investigate formally this law and prove that it critically depends on the tail distribution of the discrepancy between the true goal and the measure that is optimized. Discrepancies with long-tail distributions favor a Goodhart's law, that is, the optimization of the measure can have a counter-productive effect on the goal. We provide a formal setting to assess Goodhart's law by studying the asymptotic behavior of the correlation between the goal and the measure, as the measure is optimized. Moreover, we introduce a distinction between a {\it weak} Goodhart's law, when over-optimizing the metric is useless for the true goal, and a {\it strong} Goodhart's law, when over-optimizing the metric is harmful for the true goal. A distinction which we prove to depend on the tail distribution. We stress the implications of this result to large-scale decision making and policies that are (and have to be) based on metrics, and propose numerous research directions to better assess the safety of such policies in general, and to the particularly concerning case where these policies are automated with algorithms.


Follow-the-Perturbed-Leader with Fr\'{e}chet-type Tail Distributions: Optimality in Adversarial Bandits and Best-of-Both-Worlds

Lee, Jongyeong, Honda, Junya, Ito, Shinji, Oh, Min-hwan

arXiv.org Machine Learning

This paper studies the optimality of the Follow-the-Perturbed-Leader (FTPL) policy in both adversarial and stochastic $K$-armed bandits. Despite the widespread use of the Follow-the-Regularized-Leader (FTRL) framework with various choices of regularization, the FTPL framework, which relies on random perturbations, has not received much attention, despite its inherent simplicity. In adversarial bandits, there has been conjecture that FTPL could potentially achieve $\mathcal{O}(\sqrt{KT})$ regrets if perturbations follow a distribution with a Fr\'{e}chet-type tail. Recent work by Honda et al. (2023) showed that FTPL with Fr\'{e}chet distribution with shape $\alpha=2$ indeed attains this bound and, notably logarithmic regret in stochastic bandits, meaning the Best-of-Both-Worlds (BOBW) capability of FTPL. However, this result only partly resolves the above conjecture because their analysis heavily relies on the specific form of the Fr\'{e}chet distribution with this shape. In this paper, we establish a sufficient condition for perturbations to achieve $\mathcal{O}(\sqrt{KT})$ regrets in the adversarial setting, which covers, e.g., Fr\'{e}chet, Pareto, and Student-$t$ distributions. We also demonstrate the BOBW achievability of FTPL with certain Fr\'{e}chet-type tail distributions. Our results contribute not only to resolving existing conjectures through the lens of extreme value theory but also potentially offer insights into the effect of the regularization functions in FTRL through the mapping from FTPL to FTRL.


Risk-Averse Action Selection Using Extreme Value Theory Estimates of the CVaR

Troop, Dylan, Godin, Frédéric, Yu, Jia Yuan

arXiv.org Machine Learning

The Conditional Value-at-Risk (CVaR) is a useful risk measure in machine learning, finance, insurance, energy, etc. When the CVaR confidence parameter is very high, estimation by sample averaging exhibits high variance due to the limited number of samples above the corresponding threshold. To mitigate this problem, we present an estimation procedure for the CVaR that combines extreme value theory and a recently introduced method of automated threshold selection by Bader et al. (2018). Under appropriate conditions, we estimate the tail risk using a generalized Pareto distribution. We compare empirically this estimation procedure with the naive method of sample averaging, and show an improvement in accuracy for some specific cases. We also show how the estimation procedure can be used in reinforcement learning by applying our method to the multi-armed bandit problem where the goal is to avoid catastrophic risk.


Federated Learning for Ultra-Reliable Low-Latency V2V Communications

Samarakoon, Sumudu, Bennis, Mehdi, Saad, Walid, Debbah, Merouane

arXiv.org Machine Learning

In this paper, a novel joint transmit power and resource allocation approach for enabling ultra-reliable low-latency communication (URLLC) in vehicular networks is proposed. The objective is to minimize the network-wide power consumption of vehicular users (VUEs) while ensuring high reliability in terms of probabilistic queuing delays. In particular, a reliability measure is defined to characterize extreme events (i.e., when vehicles' queue lengths exceed a predefined threshold with non-negligible probability) using extreme value theory (EVT). Leveraging principles from federated learning (FL), the distribution of these extreme events corresponding to the tail distribution of queues is estimated by VUEs in a decentralized manner. Finally, Lyapunov optimization is used to find the joint transmit power and resource allocation policies for each VUE in a distributed manner. The proposed solution is validated via extensive simulations using a Manhattan mobility model. It is shown that FL enables the proposed distributed method to estimate the tail distribution of queues with an accuracy that is very close to a centralized solution with up to 79\% reductions in the amount of data that need to be exchanged. Furthermore, the proposed method yields up to 60\% reductions of VUEs with large queue lengths, without an additional power consumption, compared to an average queue-based baseline. Compared to systems with fixed power consumption and focusing on queue stability while minimizing average power consumption, the reduction in extreme events of the proposed method is about two orders of magnitude.


Robustness of Anytime Bandit Policies

Salomon, Antoine, Audibert, Jean-Yves

arXiv.org Machine Learning

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log(n). They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.